Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files
نویسندگان
چکیده
Nearest-neighbor search (NN-search) plays a key role for content-based retrieval. As a first contribution, this article shows that NN-search is a meaningful implementation of similarity search, even if features are high-dimensional. But NN-search over high-dimensional features is of linear complexity and query response times have not been satisfactory for large collections of multimedia objects. This paper, based on the VA-File, investigates parallel NN-search in a Network of Workstations (NOW). The article identifies various design alternatives for such a search engine and evaluates them. The alternatives basically relate to data placement and division of work among components. We also use Amdahl’s law to predict the speedup and response times for a given data set and a given setup. Because of the scan-based nature of the VA-File, one might expect an improvement almost linear in the number of components. But the best speedup we have observed is by almost 30 for a NOW with only three components. The effect is due to the elimination of the IO-bottleneck. From another perspective, our solution provides interactive-time similarity search, i.e. a search through 900 MB feature data lasts about one second in a NOW with three components.
منابع مشابه
مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کاملVA-Files vs. R*-Trees in Distance Join Queries
In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are ...
متن کاملFaster Exact Histogram Intersection on Large Data Collections Using Inverted VA-Files
Most indexing structures for high-dimensional vectors used in multimedia retrieval today rely on determining the importance of each vector component at indexing time in order to create the index. However for Histogram Intersection and other important distance measures this is not possible because the importance of vector components depends on the query. We present an indexing structure inspired...
متن کاملSimilarity-based visualization of large image collections
Effective techniques for organizing and visualizing large image collections are in growing demand as visual search gets increasingly popular. Targeting an online astronomy archive with thousands of images, we present our solution for image search and clustering based on the evaluation of image similarity using both visual and textual information. Time-consuming image similarity computation is a...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کامل